Smoothing Methods and Cross-Language Document Re-ranking

نویسندگان

  • Dong Zhou
  • Vincent P. Wade
چکیده

This paper presents a report on our participation in the CLEF 2009 monolingual and bilingual ad hoc TEL@CLEF task involving three different languages: English, French and German. Language modeling was adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particularly important when estimating a language model. The main purpose of the monolingual tasks is to compare different smoothing strategies and investigate the effectiveness of each alternative. This retrieval model was then used alongside a document re-ranking method based on Latent Dirichlet Allocation (LDA) which exploits the implicit structure of the documents with respect to original queries for the monolingual and bilingual tasks. Experimental results demonstrated that three smoothing strategies behave differently across testing languages while the LDA-based document re-ranking method should be considered further in order to bring significant improvement over the baseline language modeling systems in the cross-language setting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Modeling and Document Re-Ranking: Trinity Experiments at TEL@CLEF-2009

This paper presents a report on our participation in the CLEF-2009 monolingual and bilingual ad hoc TEL@CLEF tasks involving three different languages: English, French and German. Language modeling is adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particular important when estimating a language model. The main purpose of the mo...

متن کامل

YJST at the NTCIR-12 MobileClick-2 Task

Yahoo Japan Search Technology(YJST) team participated in the Japanese iUnit Ranking and Summarization subtasks of NTCIR-12 MobileClick-2. For the iUnit Ranking subtask, we adopted LM-based approach, which is implemented on the basis of organizers’ baseline system. We examined language model based iUnit ranking using both KL-divergence and negative cross entropy with several model smoothing meth...

متن کامل

The Smoothed Dirichlet Distribution: Understanding Cross-entropy Ranking in Information Retrieval

THE SMOOTHED DIRICHLET DISTRIBUTION: UNDERSTANDING CROSS-ENTROPY RANKING IN INFORMATION RETRIEVAL SEPTEMBER 2006 RAMESH M. NALLAPATI B.Tech., INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY M.S., UNIVERSITY OF MASSACHUSETTS AMHERST M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Prof. James Allan Unigram Language modeling is a successful probabilistic fr...

متن کامل

The Effectiveness of Results Re-Ranking and Query Expansion in Cross-language Information Retrieval

This paper presents the technique details and experimental results of the information retrieval system with which we participated at the NTCIR-8 ACLIA (Advanced Cross-language Information Access) IR4QA (Information Retrieval for Question Answering) task. Document corpus in Simplified Chinese (CS) and Traditional Chinese (CT) with topics in English, CS and CT were used in our experiments. We com...

متن کامل

Dedicated Backing-Off Distributions for Language Model Based Passage

Passage retrieval is an essential part of question answering systems. In this paper we use statistical language models to perform this task. Previous work has shown that language modeling techniques provide better results for both, document and passage retrieval. The motivation behind this paper is to define new smoothing methods for passage retrieval in question answering systems. The final ob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009